technical accuracy
Accuracy is Not Agreement: Expert-Aligned Evaluation of Crash Narrative Classification Models
Bhagat, Sudesh Ramesh, Shihab, Ibne Farabi, Sharma, Anuj
This study investigates the relationship between deep learning (DL) model accuracy and expert agreement in classifying crash narratives. We evaluate five DL models -- including BERT variants, USE, and a zero-shot classifier -- against expert labels and narratives, and extend the analysis to four large language models (LLMs): GPT-4, LLaMA 3, Qwen, and Claude. Our findings reveal an inverse relationship: models with higher technical accuracy often show lower agreement with human experts, while LLMs demonstrate stronger expert alignment despite lower accuracy. We use Cohen's Kappa and Principal Component Analysis (PCA) to quantify and visualize model-expert agreement, and employ SHAP analysis to explain misclassifications. Results show that expert-aligned models rely more on contextual and temporal cues than location-specific keywords. These findings suggest that accuracy alone is insufficient for safety-critical NLP tasks. We argue for incorporating expert agreement into model evaluation frameworks and highlight the potential of LLMs as interpretable tools in crash analysis pipelines.
- North America > United States > Iowa > Story County > Ames (0.04)
- Asia > China (0.04)
- Transportation > Ground > Road (1.00)
- Government (0.94)
Evaluating Generative AI-Enhanced Content: A Conceptual Framework Using Qualitative, Quantitative, and Mixed-Methods Approaches
Generative AI (GenAI) has revolutionized content generation, offering transformative capabilities for improving language coherence, readability, and overall quality. This manuscript explores the application of qualitative, quantitative, and mixed-methods research approaches to evaluate the performance of GenAI models in enhancing scientific writing. Using a hypothetical use case involving a collaborative medical imaging manuscript, we demonstrate how each method provides unique insights into the impact of GenAI. Qualitative methods gather in-depth feedback from expert reviewers, analyzing their responses using thematic analysis tools to capture nuanced improvements and identify limitations. Quantitative approaches employ automated metrics such as BLEU, ROUGE, and readability scores, as well as user surveys, to objectively measure improvements in coherence, fluency, and structure. Mixed-methods research integrates these strengths, combining statistical evaluations with detailed qualitative insights to provide a comprehensive assessment. These research methods enable quantifying improvement levels in GenAI-generated content, addressing critical aspects of linguistic quality and technical accuracy. They also offer a robust framework for benchmarking GenAI tools against traditional editing processes, ensuring the reliability and effectiveness of these technologies. By leveraging these methodologies, researchers can evaluate the performance boost driven by GenAI, refine its applications, and guide its responsible adoption in high-stakes domains like healthcare and scientific research. This work underscores the importance of rigorous evaluation frameworks for advancing trust and innovation in GenAI.
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- North America > United States > Arkansas (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (0.90)
- Health & Medicine > Diagnostic Medicine > Imaging (0.89)
Unifying data and AI terms for all - ITU Hub
The world is witnessing rapid technological advances in the fields of data science and artificial intelligence (AI). From helping fight climate change to addressing all the other sustainable development goals of the United Nations, valuable use cases show how cutting-edge data and AI applications can improve our daily lives. At the same time, public awareness initiatives are still behind the curve, leaving many people feeling ambivalent about AI. Moreover, for non-technical readers, disparate definitions of data and AI terms can impede easy understanding of these dynamic fields. Despite global summits, educational publications, and ample media coverage, the fields of AI and data science stand to benefit from an agreed set of accessible definitions and terminologies.